A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and its Application to XML Data Stores
نویسندگان
چکیده
Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance. We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning. Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm. Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.
منابع مشابه
cc|[email protected] A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and its Application to XML Data Stores
متن کامل
An improved algorithm to reconstruct a binary tree from its inorder and postorder traversals
It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...
متن کاملThe importance of sibling clustering for efficient bulkload of XML document trees
In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Since a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation. Hence, efficient bulkload support is crucial for XDSs. Essentially, XML bulkload is the transformation of an XML parser’s output into the XDS’s persistent storage structure...
متن کاملA Metaheuristic Algorithm for the Minimum Routing Cost Spanning Tree Problem
The routing cost of a spanning tree in a weighted and connected graph is defined as the total length of paths between all pairs of vertices. The objective of the minimum routing cost spanning tree problem is to find a spanning tree such that its routing cost is minimum. This is an NP-Hard problem that we present a GRASP with path-relinking metaheuristic algorithm for it. GRASP is a multi-start ...
متن کاملAn improved algorithm to reconstruct a binary tree from its inorder and postorder traversals
It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006